Measuring Impact in an Experiment with Partial Compliance: HIV Status and Changes in Sexual Behavior

RCT
R
Stata
Author

Satish Bajracharya

Published

March 21, 2024

We use data from “The Demand for, and Impact of, Learning HIV Status” study in Malawi. The study uses a randomized controlled trial (RCT) design, where individuals received varying degrees of monetary incentives to learn about their HIV status after undergoing an HIV Test.

Note

Study: Thornton, Rebecca L. 2008. “The Demand for, and Impact of, Learning HIV Status.” American Economic Review, 98 (5): 1829-63.

Data file: Click here

Detailed description of the intervention: Click here

For the analysis, we use the “Thornton HIV Testing Data.dta” file.

Import the data

Execution in R

The data file is a Stata (.dta) file. To import the dataset in R, we will need to install the haven package in R and use the read_dta() function. Run the following code in R to install the haven package:

install.packages("haven")

The downloaded files come with a readme document, which gives a detailed description of the variables used in the study.

library(haven)
library(dplyr)
library(estimatr)# for robust standard errors
# import the .dta file
data <- read_dta("C:/Data analysis/Thornton data/Data/Thornton HIV Testing Data.dta")
Execution in Stata

Use the cd command to import the dataset.

cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 


. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 

Create treatment variable

We create a variable called treatment, which takes on a value of 1 if the participant received any financial incentive, and otherwise takes on a value of 0. The variable tinc records the amount of monetary incentive received by the respondents. We label the values of 0 and 1 as control and treatment.

Execution in R
data_1 <- data |>
  filter(!is.na(tinc)) |> #remove na in tinc
  mutate(treatment = ifelse(tinc > 0, 1, 0)) # create treatment variable
data_1$treatment <- factor(data_1$treatment, 
                       levels = c(0, 1),
                       labels = c("Control", "Treatment"))
Execution in Stata
cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment


. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear

. generate treatment = cond(tinc>0, 1, 0)

. label define treatment 0 "Control" 1 "Treatment"

. label val treatment treatment

Calculating the compliance rate

In this analysis, we try to study the effect of learning one’s HIV status on the decision to purchase a condom. We focus on a sub-group of individuals who are sexually active and HIV positive. To do this, we need to restrict our sample to the sexually active and HIV positive individuals and calculate the compliance rate for this sub-group.

Note

Baseline data was collected in 2004 and follow up data was collected in 2005.

Variable description:

treatment: takes on the value of 1 if individual received monetary incentive and 0 otherwise.
hadsex12: Indicator if reported having sex in the pas 12 months from baseline (1 = Yes, 0 = No).
hiv2004: HIV results (1 = HIV Positive, 0 = HIV Negative, -1 = Indeterminent)
got: Indicator if obtained HIV results (1 = learned HIV results)
anycond: Indicator of any condom purchased at the follow-up survey

Execution in R
data_1 <- data_1 |>
  filter( hadsex12 == 1, # restrict the sample size to hadsex 12 & hiv 2004
          hiv2004 == 1,
          !is.na(got), # remove NAs
          !is.na(anycond)) # remove NAs
# create variable to calculate the share of people in the control and treatment group
data_1 <- data_1 |>
  mutate(followed_treatment = ifelse(treatment ==  "Treatment", got, 1-got)) 
# tabulate followed_treatment given treatment == 1
trt_dat <- data_1 |>
  filter(treatment == "Treatment") |>
  select(followed_treatment) |>
  group_by(followed_treatment) |>
  summarize(Count = n()) |>
  mutate(Percent = Count/sum(Count))
print(trt_dat) 
# A tibble: 2 × 3
  followed_treatment Count Percent
               <dbl> <int>   <dbl>
1                  0    12   0.286
2                  1    30   0.714
# tabulate followed_treatment given treatment == 0
cntrl_dat <- data_1 |>
  filter(treatment == "Control") |>
  select(followed_treatment) |>
  group_by(followed_treatment) |>
  summarize(Count = n()) |>
  mutate(Percent = Count/sum(Count))
print(cntrl_dat)
# A tibble: 2 × 3
  followed_treatment Count Percent
               <dbl> <int>   <dbl>
1                  0     3     0.3
2                  1     7     0.7
# calculate the compliance rate
compliance_rate <- 71.4-30.0
print(compliance_rate)
[1] 41.4
Execution in Stata
cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
keep if hadsex12 == 1 & hiv2004 == 1
drop if missing(tinc) | missing(got) | missing(anycond)
generate followed_treatment = cond(treatment == 1, got, 1-got)
tab followed_treatment if treatment == 0
tab followed_treatment if treatment == 1
dis "Compliance rate =" 71.4 - 30


. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear

. generate treatment = cond(tinc>0, 1, 0)

. label define treatment 0 "Control" 1 "Treatment"

. label val treatment treatment

. keep if hadsex12 == 1 & hiv2004 == 1
(4,698 observations deleted)

. drop if missing(tinc) | missing(got) | missing(anycond)
(70 observations deleted)

. generate followed_treatment = cond(treatment == 1, got, 1-got)

. tab followed_treatment if treatment == 0

followed_tr |
    eatment |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          3       30.00       30.00
          1 |          7       70.00      100.00
------------+-----------------------------------
      Total |         10      100.00

. tab followed_treatment if treatment == 1

followed_tr |
    eatment |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         12       28.57       28.57
          1 |         30       71.43      100.00
------------+-----------------------------------
      Total |         42      100.00

. dis "Compliance rate =" 71.4 - 30
Compliance rate =41.4

Here, 71.4% of the treatment group learned about their HIV status and 30% of the control group did so. The compliance rate is the difference between the share of treated individuals in the treatment group (71.43%) and the share of treated individuals in the control group (30%). Hence, the compliance rate for the experiment is 41.4% (71.43% - 30%).

Calculating the Intent to Treat Effect and the Local Average Treatment Effect (LATE) estimate

In R, we use the lm_robust() function from the estimatr package to run a regression with robust standard errors. In Stata, we use the regress command with the robust option for the same.

Execution in R
reg <- lm_robust(anycond ~ treatment, data = data_1, se_type = "HC1")
summary(reg)

Call:
lm_robust(formula = anycond ~ treatment, data = data_1, se_type = "HC1")

Standard error type:  HC1 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept)          0.2000     0.1290   1.550   0.1273 -0.05910   0.4591 50
treatmentTreatment   0.2286     0.1507   1.517   0.1356 -0.07408   0.5312 50

Multiple R-squared:  0.03429 ,  Adjusted R-squared:  0.01497 
F-statistic: 2.301 on 1 and 50 DF,  p-value: 0.1356
Execution in Stata
cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
keep if hadsex12 == 1 & hiv2004 == 1
drop if missing(tinc) | missing(got) | missing(anycond)
regress anycond treatment, robust


. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear

. generate treatment = cond(tinc>0, 1, 0)

. label define treatment 0 "Control" 1 "Treatment"

. label val treatment treatment

. keep if hadsex12 == 1 & hiv2004 == 1
(4,698 observations deleted)

. drop if missing(tinc) | missing(got) | missing(anycond)
(70 observations deleted)

. regress anycond treatment, robust

Linear regression                               Number of obs     =         52
                                                F(1, 50)          =       2.30
                                                Prob > F          =     0.1356
                                                R-squared         =     0.0343
                                                Root MSE          =     .48756

------------------------------------------------------------------------------
             |               Robust
     anycond |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   treatment |   .2285714   .1506789     1.52   0.136    -.0740761     .531219
       _cons |         .2   .1289961     1.55   0.127    -.0590963    .4590963
------------------------------------------------------------------------------

The estimates show that 20% of the sexually active HIV-positive individuals, who did not receive any monetary incentive to learn about their HIV status, still purchased condoms. In contrast, individuals who received a monetary incentive to learn about their HIV status were 22.86% more likely to purchase condoms. This is the intent to treat effect. Even though the monetary incentive provided to learn about one’s HIV status increased the willingness to buy condoms, it is not statistically significant.

Next, we use the results of the regression of anycond on treatment and got on treatment to calculate the Local Average Treatment Effect (LATE).

Note

LATE = Intent to Treat / Compliance rate

Execution in R
reg_1 <- lm_robust(got ~ treatment, data = data_1, se_type = "HC1")
summary(reg_1)

Call:
lm_robust(formula = got ~ treatment, data = data_1, se_type = "HC1")

Standard error type:  HC1 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept)          0.3000     0.1478   2.030  0.04769 0.003168   0.5968 50
treatmentTreatment   0.4143     0.1640   2.526  0.01474 0.084898   0.7437 50

Multiple R-squared:  0.115 ,    Adjusted R-squared:  0.09727 
F-statistic: 6.382 on 1 and 50 DF,  p-value: 0.01474
LATE <- 0.2285714/0.4142957
print(LATE)
[1] 0.5517108
Execution in Stata
cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
keep if hadsex12 == 1 & hiv2004 == 1
drop if missing(tinc) | missing(got) | missing(anycond)
regress got treatment, robust
dis "LATE =" 0.2285714/0.4142857


. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear

. generate treatment = cond(tinc>0, 1, 0)

. label define treatment 0 "Control" 1 "Treatment"

. label val treatment treatment

. keep if hadsex12 == 1 & hiv2004 == 1
(4,698 observations deleted)

. drop if missing(tinc) | missing(got) | missing(anycond)
(70 observations deleted)

. regress got treatment, robust

Linear regression                               Number of obs     =         52
                                                F(1, 50)          =       6.38
                                                Prob > F          =     0.0147
                                                R-squared         =     0.1150
                                                Root MSE          =     .46198

------------------------------------------------------------------------------
             |               Robust
         got |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   treatment |   .4142857   .1639922     2.53   0.015     .0848976    .7436738
       _cons |         .3   .1477836     2.03   0.048     .0031679    .5968321
------------------------------------------------------------------------------

. dis "LATE =" 0.2285714/0.4142857
LATE =.55172409

The coefficient of treatment variable in this regression is equal to compliance rate that we calculated earlier. Amongst sexually active and HIV-positive respondents, we estimate that learning one’s HIV status increases the likelihood of purchasing condoms by about 55.17%. However, when we calculate the LATE estimate this way, we will not get the standard errors and we will not know if it is statistically significant. An alternative is to use the 2 SLS method to calculate the LATE effect.

Execution in R

iv_reg <- iv_robust(anycond ~ got | treatment, data = data_1, se_type = "HC1")
summary(iv_reg)

Call:
iv_robust(formula = anycond ~ got | treatment, data = data_1, 
    se_type = "HC1")

Standard error type:  HC1 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  CI Lower CI Upper DF
(Intercept)  0.03448     0.1561  0.2208  0.82612 -0.279152   0.3481 50
got          0.55172     0.2729  2.0219  0.04855  0.003643   1.0998 50

Multiple R-squared:  0.1776 ,   Adjusted R-squared:  0.1612 
F-statistic: 4.088 on 1 and 50 DF,  p-value: 0.04855
Execution in Stata
cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
keep if hadsex12 == 1 & hiv2004 == 1
drop if missing(tinc) | missing(got) | missing(anycond)
ivregress 2sls anycond (got = treatment), robust


. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear

. generate treatment = cond(tinc>0, 1, 0)

. label define treatment 0 "Control" 1 "Treatment"

. label val treatment treatment

. keep if hadsex12 == 1 & hiv2004 == 1
(4,698 observations deleted)

. drop if missing(tinc) | missing(got) | missing(anycond)
(70 observations deleted)

. ivregress 2sls anycond (got = treatment), robust

Instrumental variables (2SLS) regression          Number of obs   =         52
                                                  Wald chi2(1)    =       4.25
                                                  Prob > chi2     =     0.0392
                                                  R-squared       =     0.1776
                                                  Root MSE        =     .44118

------------------------------------------------------------------------------
             |               Robust
     anycond |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         got |   .5517241   .2675736     2.06   0.039     .0272896    1.076159
       _cons |   .0344828   .1531167     0.23   0.822    -.2656204    .3345859
------------------------------------------------------------------------------
Instrumented:  got
Instruments:   treatment

The coefficient of got in the regression and the calculated value for the LATE estimate are the same. The 2SLS regression estimates a p-value of 0.039 for got. Therefore, we can conclude that learning about one’s HIV-positive status increases the likelihood of purchasing condoms by a statistically significant margin.